OSM Points of Interest assessment

Possible destinations for travel time indicators ?

The use of Points Of Interest (POI) from the OpenStreetMap (OSM) database appears promising, when appropriate OSM tags (defined by a key-values) are selected to get facilities and services for free all around the World. This could be used to identify potential centralities (commercial, cultural, recreational) and potential destinations for travel time and accessibility indicators over Europe. To go it that way, it requires first to identify the relevant categories of POI that can answer to policy needs. This is done in the first part of the technical document by associating OSM tags to some of the Rural Compass categories. This conceptual work done, a more in depth analysis of the OSM completeness is required. This is proposed at the scale of Occitanie by comparing the degree of completeness of 21 consolidated POI from OSM to an institutional database in France: the BPE. The completeness of the OSM database is also discussed at European scale with other approaches. The last section of the report is a methodological proposal for identifying clusters of interest, despite the lack of OSM contributions in some areas and for some POI. The result is a set of 1km grid cells that present interesting characteristics in term of concentration and diversity of facilities and amenities for rural territories. This is done for Occitanie and adapted to 2 living labs of the GRANULAR project: Upper Norrland (SE) and Ourense Province (SE). The resulting grid cells of interest are subject to caution, due to completeness issue and a methodological framework that could certainly be improved.
Author
Affiliation

Ronan Ysebaert, Marianne Guérois , Timothée Giraud, Nicolas Lambert, Matthieu Viry

Published

August 3, 2023

General objectives

Testing OSM Points of Interest

As introduced in the data and methods review document (Ysebaert et al. (2023)), several solutions can be considered for identifying destinations for computing travel times from 1km² European reference grid adapted to rural areas observation.

The easiest and the strongest in term of quality consists in considering POI provided by institutional databases, such as healthcare services provided by Eurostat. The use of such data does not imply any data preparation since an important work of data harmonization and documentation is realized GISCO on this dataset. This dataset is already used by Eurostat to evaluate travel time to the nearest hospital from 1km² grid cells in 2020, using non open source solution for travel time calculations (Tomtom). Results are afterwards aggregated at NUTS3 level to provide the share of the population living within 15 minutes driving time of a hospital (Eurostat (2023)). Thus, this dataset is becoming central in the European statistics landscape, there are important chances that this resource will be maintained, improved and updated in the future. However, the thematic scope appears limited in the context of rural areas accessibility assessment.

Another possibility may be considered by using polygon layers, also provided by institutions: The Urban Morphological Zones layer provides the geographical coverage of urban objects derived from the Corine Land Cover and give information regarding the access to little and medium sized urban facilities; EU protected areas are provided by the European Environmental Agency and can give useful information to access to natural areas of interest.

Code
# Import reference layers
proc_area <- st_read("data/CDDA_2022_v01_public.gpkg", layer = "ProtectedSite")
sites <- st_read("data/CDDA_2022_v01_public.gpkg", layer = "DesignatedArea")

# Import region
reg <- st_read("data/reg_occitanie.gpkg")
reg <- st_transform(reg, 3035)

# Keep only UICN categories
sites <- sites[sites$iucnCategory %in% c("Ia", "Ib", "II", "III", "IV", "V", "VI"),]
proc_area <- merge(proc_area[,c("cddaId", "siteName")], 
                   sites[,c("cddaId", "iucnCategory", "legalFoundationDate")], 
                   by = "cddaId")
proc_area <- st_transform(proc_area, 3035)
proc_area <- st_intersection(proc_area, reg)

# Simplify contours (rmapshaper library)
proc_area <- ms_simplify(proc_area, keep = 0.01)
st_write(proc_area, "data/protected_areas.gpkg")

Figure 1 displays protected areas covering the Occitanie region in France. Among the UICN classification, the categories dedicated to National Parks (cat. II) and-or protected Landscape/Seascape (cat.V) may be interesting. It would imply to evaluate the travel time to the nearest polygon border.

Code
proc_area <- st_read("data/protected_areas.gpkg", quiet = TRUE)
proc_area <- st_transform(proc_area, 4326)

proc_1 <- proc_area[proc_area$iucnCategory == "Ia",]
proc_2 <- proc_area[proc_area$iucnCategory == "II",]
proc_3 <- proc_area[proc_area$iucnCategory == "III",]
proc_4 <- proc_area[proc_area$iucnCategory == "IV",]
proc_5 <- proc_area[proc_area$iucnCategory == "V",]
proc_6 <- proc_area[proc_area$iucnCategory == "VI",]

leaflet() %>%
  addProviderTiles("OpenStreetMap.HOT") %>%
  addPolygons(data = proc_1, color = "#135210", fillOpacity = 0.7, weight = 1.2, 
              group = "Strict Nature Reserve (IUCN Ia)", 
              popup = paste0("<b>", proc_1$siteName, "</b>")) %>%
  addPolygons(data = proc_2, color = "#33a02c", fillOpacity = 0.7, weight = 1.2, 
              group = "National Park (IUCN II)",
              popup = paste0("<b>", proc_2$siteName, "</b>")) %>%
  addPolygons(data = proc_3, color = "#1f78b4", fillOpacity = 0.7, weight = 1.2, 
              group = "Natural Monument (IUCN III)",
              popup = paste0("<b>", proc_3$siteName, "</b>")) %>%
  addPolygons(data = proc_4, color = "#f0a630", fillOpacity = 0.7, weight = 1.2, 
              group = "Species Management Area (IUCN IV)", 
              popup = paste0("<b>", proc_4$siteName, "</b>"))%>%
  addPolygons(data = proc_5, color = "#e5f005", fillOpacity = 0.7, weight = 1.2, 
              group = "Protected Landscape (IUCN .V)",
              popup = paste0("<b>", proc_5$siteName, "</b>")) %>%
  addPolygons(data = proc_6, color = "#bf0826", fillOpacity = 0.7, weight = 1.2, 
              group = "Prot. Area with sust. use of nat. resources (IUCN VI)", 
              popup = paste0("<b>", proc_6$siteName, "</b>")) %>%
  addLayersControl(overlayGroups = c("Strict Nature Reserve (IUCN Ia)",
                                     "National Park (IUCN II)",
                                     "Natural Monument (IUCN III)",
                                     "Species Management Area (IUCN IV)",
                                     "Protected Landscape (IUCN .V)",
                                     "Prot. Area with sust. use of nat. resources (IUCN VI)"),
                   options = layersControlOptions(collapsed = FALSE)) %>% 
  hideGroup(c("Strict Nature Reserve (IUCN Ia)", "Natural Monument (IUCN III)", 
              "Species Management Area (IUCN IV)", 
              "Prot. Area with sust. use of nat. resources (IUCN VI))"))  
Figure 1: A Quarto document in RStudio. Code and output interleaved in the document, with the plot output appearing right underneath the code.

In term of thematic depth, the use of OpenStreetMap (OSM) would certainly be the most promising solution. This open and free initiative allows to get everything we want, if the data are correctly selected and prepared. It is moreover quite easy to import the information the user is looking for with dedicated software libraries or APIs.

Currently, several institutions consider OSM as potential data source provider for a set of basic amenities and facilities. This is for instance the case for platform Connaissance du Territoire. This institutional platform is a partnership project in Provence-Alpes-Côte-d’Azur region (PACA). It aims at producing and updating knowledge in an analytically and operational approach. This project includes French institutions (region, departments, INSEE, IGN) and local stakeholders. Among others, this initiative feeds a dedicated data portal : DataSud (Figure 2). Several datasets and geospatial files are centralized in this platform. One specific field of interest is the extraction and follow-up of OpenStreetMap Points of Interest located in the PACA region. Several topics of interest are highlighted and associated to adapted OSM tags.

Figure 2: OSM extracts available in the Datasud platform: points of interest in several format.

Behind each topic a set of spatial files are disseminated (GeoJSON, csv, kml, shp) including OSM POI and associated attributes. The data processing is documented (in French) in a dedicated GitHub repository. A presentation made in 2023 by the main contributor, Tony Eymery, summarizes all this workflow and the general objectives behind this initiative. It allows as a consequence for a user not specialized in the architecture of the OSM database to download and re-use easily its content.

For GRANULAR activities this initiative is especially interesting in several perspectives:

  • It shows a concrete case of use of crowd-sourcing data for feeding institutional data portal, supported by scripts and theoretically regularly updated.
  • It demonstrates the growing interest of regional stakeholders and institutions on this open data initiative.
  • It gives significant methodological input for the data processing of the OSM database related to Points of Interest (POI).
  • It makes the link between OSM database and tags (key-values) and political oriented topics of interest.

The adaptation of such approach should be considered for GRANULAR activities. This can allow the definition of POI clusters as potential destinations for travel time calculations. In the meantime, the implementation of this methodological framework should not be naive. A lot of studies reminding that the completeness issue has to be highly considered. To be considered, this approach requires before to set several conceptual and methodological aspects:

  • To identify and extract appropriate OSM geographical objects, regarding analytically purposes.
  • To Provide inputs on the completeness of the OSM database, and the POI selected especially.
  • All things being equal regarding the completeness of the OSM database,to init a methodological proposal for defining clusters of interest at European scale.

This technical document is mainly based on the region of belonging of Pays-Pyrénéees-Méditerrannée Living Lab: Occitanie (FR). At the end of the document results are extended to two other Living Labs: Ourense Province (ES) and Upper Norrland (SE).

R packages

This technical document uses several packages. First dedicated to import data: giscoR and eurostat (marginally) for European reference layers and statistics. osmextract is the core package to import massively OpenStreetMap (OSM) data without sizing limitations. ohsome is used to get statistics (number of POI included by NUTS units).

sf is the core library for spatial data handling. rmapshaper, reshape2 are punctually used. potential is used to evaluate the completeness of the OSM database in a spatial neighborhood.

The data visualization is based on leaflet (interactive maps), mapsf (thematic maps) and DT (dynamic tables).

Code
# Data import
library(giscoR) # Import European reference layers
library(eurostat) # Import European regional statistics
library(osmextract) # Extract POI from the OSM Database
library(ohsome) # Extract OSM statistics

# Data handling
library(sf) # Geospatial data handling 
library(rmapshaper) # Simplify contours
library(reshape2) # reshape dataframe
library(potential) # Compute potentials and equipotential area

# Displaying data
library(leaflet) # Interactive mapping
library(mapsf) # Thematic mapping
library(DT) # Display table with JS components

Linking OSM tags and Rural Compass

General approach

The Rural Compass (Figure 3), conceptualized by University of Wageningen (WP2, GRANULAR Project), aims at providing an analytically tool for rural-proofing local, national and EU policies ; and a dashboard of indicators for monitoring and evaluating policies produced.

Figure 3: GRANULAR Rural compass (version June 2023).

This conceptual tool is especially useful in this framework: Each OSM geographical object (nodes, ways, relations) is defined by tag built by a key-value combination commonly referred to as tags. The key is used to describe a topic, category or type of feature. The value provides detail for the key-specific feature. A first approach may consist by identifying relevant OSM tags answering to the Rural Compass Dimensions.

taginfo is a useful tool to obtain statistics related to the OSM database. It provides statistics on OSM tags count at World level: most common keys and values included in each key. This is especially useful to identify the most contributed OSM key-values pairs. 9 OSM keys have been considered, assuming that their characteristics are near from Rural Compass Dimensions of interest:

  • shop marks the location of a shop and the products that sells (5.7 million objects included in the OSM database in July 2023 in the World, and 289 documented values in the OSM Wiki).
  • tourism is used for places of specific interest to tourists (3.1 million objects and 28 documented values).
  • amenity describes important facilities for visitors and residents, such as toilets, telephones, banks, pharmacies and schools (2.4 million objects and 300 documented values).
  • sport describes important facilities for visitors and residents, such as toilets, telephones, banks, pharmacies and schools (2.4 million objects and 300 documented values).
  • historic key is used to tag features that still exist or of which traces are observable and that are of historic interest (1.7 million objects and 64 documented values).
  • office is used to map an office, a place preponderantly providing services, frequently selling them (1 million objects, 83 documented values).
  • healthcare is used to map facility that provides healthcare (750 000 objects, 31 documented values)
  • craft is used for a place producing or processing customized goods (285 000 objects, 100 documented values).
  • education is sometimes used to map schools and kindergartens (5700 OSM objects, 3 documented associate values)

Figure 4: Mains values associated to the amenity key (taginfo Website).

Figure 4 shows that 20.41 % of the OSM values associated to the amenity key corresponds to parkings. These values are obviously not interesting for GRANLULAR’s activities. This is not the same for other values such as restaurants (5.43 % of amenities contributions), schools (5.24 %) or cafes (2.13 %).

An attempt consisted for each of the these keys to identify the appropriate OSM key-values for GRANULAR centers of interest (Rural Compass). This selection / consolidation is based on several basic rules:

  • Do not consider “national-oriented” tags / misspelling / not universally accepted by the OSM community: key-values must have a minimum of 500 occurrences in the OSM database.
  • Rural amenities oriented: focus the selection on basic services, facilities and amenities. POI corresponding to services or facilities in high hierarchy (universities, etc.) are not considered.
  • Create consolidated categories: some heterogeneous OSM tags correspond to similar services (restaurant, fast-food, cafe, bar and pub for instance). The tagging of these OSM objects may induce confusions for contributors (Is a pub a restaurant or not ? Is a cafe also a bar or not ?) These values have been grouped into consolidated categories (restaurant/cafe/bar).
  • Link OSM tags to Rural Compass: Each kept OSM tag is linked to a Rural Compass dimension.
  • A same OSM tag can appear in several Rural Compass dimensions: this is for example the case for amenity=pharmacy which will appear both in the rural amenity and the rural well-being categories.

The result is a consolidated file that links OSM tags (key, value), consolidated categories and Rural Compass concept. Each Rural compass dimension considered is described in the sub-section below.

Code
amenity <- read.csv("data/osm_world/amenity.csv", na = "")
craft <- read.csv("data/osm_world/craft.csv", na = "")
education <- read.csv("data/osm_world/education.csv", na = "")
healthcare <- read.csv("data/osm_world/healthcare.csv", na ="")
historic <- read.csv("data/osm_world/historic.csv", na ="")
office <-  read.csv("data/osm_world/office.csv", na ="")
shop <- read.csv("data/osm_world/shop.csv", na ="")
sport <- read.csv("data/osm_world/sport.csv", na ="")
tourism <- read.csv("data/osm_world/tourism.csv", na ="")
cols <- c("key", "value", "count", "OSM_CONCEPT", "OSM_COMMENT", "GRANULAR.",
          "GRANULAR..1", "NAME_CONSO", "description")
df <- rbind(amenity[,cols], education[,cols],  healthcare[,cols], 
            historic[,cols], office[,cols], shop[,cols], sport[,cols], 
            tourism[,cols], craft[,cols])

tmp1 <- df[!is.na(df$GRANULAR.),]
tmp2 <- df[!is.na(df$GRANULAR..1),]

colnames(tmp1)[6] <- "granular"
colnames(tmp2)[7] <- "granular"

# GRANULAR center of interests
cols <- c("granular", "NAME_CONSO", "key", "value", "count")
granular <- rbind(tmp1[,cols], tmp2[,cols])
granular <- granular[order(granular$NAME_CONSO),]

# Out of GRANULAR centers of interest
tmp3 <- df[is.na(df$GRANULAR.),]
out <- tmp3[,c("key", "value", "count", "description")]

Rural amenities

Code
tmp <- granular[granular == "Rural amenities",]
tmp <- tmp[!is.na(tmp$granular),]

datatable(tmp, rownames = FALSE, escape = FALSE, 
          class = "hoover",
          options = list(lengthMenu = c(10, 20, 50)))

Rural amenities corresponds to basic shops and amenities. 29 consolidated POI (column NAME_CONSO in the tables below) corresponding to 47 OSM key-values pairs. The count column displays the number of OSM objects answering to the key-value query in the World in July 2023.

Public services

Code
tmp <- granular[granular == "Public services",]
tmp <- tmp[!is.na(tmp$granular),]

datatable(tmp, rownames = FALSE, escape = FALSE, 
          class = "hoover",
          options = list(lengthMenu = c(10, 20)))

Public services category includes the most common public services included in the OSM database. It corresponds to 11 consolidated Points of Interest corresponding to 14 OSM key-values pairs.

Recreational prospects

Code
tmp <- granular[granular == "Recreational prospects",]
tmp <- tmp[!is.na(tmp$granular),]

datatable(tmp, rownames = FALSE, escape = FALSE, 
          class = "hoover",
          options = list(lengthMenu = c(10, 20, 50)))

Recreational prospects includes equipments related to sport, musical activities and touristic activities (viewpoint, wine cellar, theme park, aquarium etc.). It groups 24 consolidated Points of Interest corresponding to 48 OSM key-values pairs.

Rural wellbeing

Code
tmp <- granular[granular == "Rural wellbeing",]
tmp <- tmp[!is.na(tmp$granular),]

datatable(tmp, rownames = FALSE, escape = FALSE, 
          class = "hoover",
          options = list(lengthMenu = c(10, 20)))

Rural well-being is a category that uses OSM POI on health and social (community center, co-working space, social center, market place). It groups 10 consolidated Points of Interest corresponding to 17 OSM key-values pairs.

Cultural attractiveness

Code
tmp <- granular[granular == "Cultural attractiveness",]
tmp <- tmp[!is.na(tmp$granular),]

datatable(tmp, rownames = FALSE, escape = FALSE, 
          class = "hoover",
          options = list(lengthMenu = c(10, 20, 50)))

Cultural attractiveness includes OSM POI that can have an historical or cultural interest (museums, ruins, etc.). It corresponds to 17 consolidated Points of Interest corresponding to 30 OSM key-values pairs.

Diversified economies

Code
tmp <- granular[granular == "Diversified economies",]
tmp <- tmp[!is.na(tmp$granular),]

datatable(tmp, rownames = FALSE, escape = FALSE, 
          class = "hoover",
          options = list(lengthMenu = c(10, 20, 50)))

Diversified economies includes a lot of OSM categories related to shop and manufacturing activities (46 consolidated Points of Interest corresponding to 58 OSM key-values pairs). The question we raise for this topic is the diversity of facilities in given areas.

Out of scope

The table below shows all the most common key-values (500 values contributions at least at World level) coming from the OSM keys amenity, education, healthcare, historic, office, shop, sport and tourism that have not been kept. These values are considered out of GRANULAR’s scope (bench, picnic sites, memorials, wayside crosses), or subject to confusions (nursing home, tourism information)

Code
datatable(out, rownames = FALSE, escape = FALSE, 
          class = "hoover",
          options = list(lengthMenu = c(10, 20, 50)))

OSM completeness in Occitanie (FR)

A reference institutional database: BPE

The Base Permanente des Équipements (BPE) is, as far we known, the only institutional database at European level that provides information (location, counts for statistical units) for a wide range of equipment and services, merchant or not, accessible to the public.

Produced by the French National Statistical Institute (INSEE (2012)), it covers 188 services and facilities in 2021. It is divided into seven main groups: personal services, businesses, education, health and social services, transport-travel, sports-leisure-culture and tourism. BPE is built from several administrative sources. Since September 2018, data on BPE have been released in evolution throughout the territory. They cover a limited number of types of facilities and two years spaced five years apart: 2012-2017, 2013-2018, 2014-2019, 2015-2020 and 2016-2021.

This is the core data source used within the ANCT report on little commercial centralities (Hilal et al. (2020)). One of the main results of the study is a rural typology based on services and levels of centrality (Figure 5): Municipalities (French LAU) are classified and hierarchized according to their level of centrality based on the diversity of facilities and services. The resulting typology is based on automated classification, combining dynamic and hierarchical clustering. It classifies LAU in four levels of centrality (local, intermediate, structuring and major centers of equipment and services).

Adapting and extending this typology at European level is obviously impossible, the methodology being quite oriented by this French data source and a specific territorial level of analysis: LAU level. However, identified centralities can be usefully compared to the methodological framework we propose based on OSM data. This will be done in the fourth part of this technical report.

Figure 5: ANCT typology on diversity of facilities and services, based on BPE data.

The OSM and the BPE databases do not follow the same structure and do not include the same POI, BPE being built from administrative data sources and OSM being fed by contributors. OSM mapping techniques being based on street views, photos, traces, aerial imagery, outdoor mapping and OSM contributor personal knowledge. In other terms, mapped objects must be visible. Thus, a significant number facilities included in the BPE cannot be considered by OSM contributors (home cares for elderly people) or French-system oriented (specialized schools, social reinsertion centers, etc.)

Figure 6: Documentation of BPE database (INSEE).

Nevertheless, some POI can be commonly and clearly identified both in the BPE and the OSM databases. A work has been consequently carried out to link BPE and OSM tags of interest, when possible and based on BPE (Figure 6) and OSM documentations.

OSM-BPE full and partial match

Code
cols <- c("key", "value", "GRANULAR.", "GRANULAR..1", "NAME_CONSO", 
          "description", "BPE")
df <- rbind(amenity[,cols], education[,cols],  healthcare[,cols], 
            historic[,cols], office[,cols], shop[,cols], sport[,cols], 
            tourism[,cols])

tmp1 <- df[!is.na(df$GRANULAR.),]
tmp2 <- df[!is.na(df$GRANULAR..1),]

colnames(tmp1)[3] <- "granular"
colnames(tmp2)[4] <- "granular"

# GRANULAR center of interests
cols <- c("granular", "NAME_CONSO", "key", "value", "description", "BPE")
df <- rbind(tmp1[,cols], tmp2[,cols])

full <- df[!is.na(df$BPE),]

datatable(full, rownames = FALSE, escape = FALSE, 
          class = "hoover",
          options = list(lengthMenu = c(10, 20, 50)))

45 consolidated categories coming from OSM have be linked to the BPE database. Some aggregates have been introduced into the BPE categories to make possible comparisons: the C11 category is a creation that groups BPE’s categories C101, C104, C201 and C301, which corresponds respectively to elementary school in France (6-10 years), secondary schools (11-15 years) and high schools (16-18 years). These categories have been aggregated into a single category in order to fit with OSM tagging practices.

A lot of OSM tags of interest does not fit with BPE centers of interest. It corresponds to specific shops and amenities (spa, ski rental), outdoor sport activities (free flying, canoe) or historical monuments in general.

Code
out <- df[is.na(df$BPE),]

datatable(out, rownames = FALSE, escape = FALSE, 
          class = "hoover",
          options = list(lengthMenu = c(10, 20, 50)))

Comparing OSM and BPE databases - data processing

This comparison is done for a set of POI of interest where correspondence with the BPE has been found. It corresponds to 42 OSM tags, 24 BPE categories and 21 consolidated categories.

The data preparation process is the following :

  • Import BPE’s POI overlapping Occitanie. Only POI with an identified matching with OSM POI are considered.
  • Selecting 1km grid overlapping Occitanie from GISCO and populated by more than 1 inhabitant in 2006, 2011, 2018 or 2021.
  • Import OSM tags of interest with the R package osmextract.
  • Join the associated BPE category to the OSM tag.

This is done once (due to time calculation).

Code
# Import French BPE 2021 geolocalized
bpe <- read.csv("data/bpe21_ensemble_xy.csv", sep = ";")

# Only Occitanie region
bpe <- bpe[bpe$REG == "76",] # 250 000 POI

# Only selected POI
bpe <- bpe[bpe$TYPEQU %in% unique(bpe_agg$BPE), ] # 98066 equipments
bpe <- bpe[,c("LAMBERT_X", "LAMBERT_Y", "TYPEQU")]
write.csv(bpe, "data/bpe_occitanie.csv", sep = ";", row.names = FALSE) # Export for further reading

# Extract 1km grid cells in Occitanie Region (FRJ)
grid_1k <- gisco_get_grid(resolution = "1", spatialtype = "REGION")
grid_1k <- grid_1k[grid_1k$NUTS2021_1 == "FRJ",]
grid_1k <- grid_1k[,c("GRD_ID", "TOT_P_2006", "TOT_P_2011", "TOT_P_2018", "TOT_P_2021")]

# Keep only populated grid cells
grid_1k <- grid_1k[grid_1k$TOT_P_2006 > 0 |
                 grid_1k$TOT_P_2011 > 0 |
                 grid_1k$TOT_P_2018 > 0 |
                 grid_1k$TOT_P_2021 > 0 ,]

st_write(grid_1k, "data/grid_1km_occitanie.gpkg") # Export for futher uses

# Function from https://github.com/cyipt/actdev/raw/main/code/get_POI.R
# Added sf_use_s2(FALSE)

# A funbction to extract POI from key-values
get_POI = function(
  region_name,
  et = c("shop", "amenity"),
  q = "SELECT * FROM 'points' WHERE shop IN ('supermarket')",
  ...
  ) {

  osm_points = osmextract::oe_get(region_name, query = q, extra_tags = et, 
                                  force_vectortranslate = TRUE)
  # names(osm_points)
  
  # get supermarket polygons
  q_poly = gsub(pattern = "points", replacement = "multipolygons", x = q)
  osm_polygons = osmextract::oe_get(region_name, query = q_poly, extra_tags = et, 
                                    force_vectortranslate = TRUE)
  
  # deduplicate supermarkets
  if(nrow(osm_polygons) > 0) {
    sf_use_s2(FALSE) # added to the original function
    osm_polygons <- st_make_valid(osm_polygons)  # added to the original function
    osm_points_in_polygons = osm_points[osm_polygons, ]
    # mapview::mapview(osm_points_in_polygons)
    osm_points_not_in_polygons = osm_points[
      !osm_points$osm_id %in% osm_points_in_polygons$osm_id,
    ] 
    
    # convert polygons to points and join together
    osm_points_not_in_polygons <- sf::st_transform(osm_points_not_in_polygons, 3035)
    osm_polygons <-  sf::st_transform(osm_polygons, 3035)
    osm_polygons_centroids = sf::st_centroid(osm_polygons)
    setdiff(names(osm_points), names(osm_polygons_centroids))
    names_in_both = intersect(names(osm_points), names(osm_polygons_centroids))
    osm_points = rbind(osm_points_not_in_polygons[names_in_both], 
                       osm_polygons_centroids[names_in_both])
  } 
  osm_points
}

# Extract amenities POI of interest
q = "SELECT * FROM 'points' WHERE amenity IN ('restaurant', 'fast_food', 'cafe',
'bank', 'pharmacy', 'bar', 'post_office', 'pub', 'driving_school', 
'veterinary', 'cinema')"
tmp1 <- get_POI(region_name = "Languedoc-Roussillon", q = q, et = "amenity")
tmp2 <- get_POI(region_name = "Midi-Pyrenees", q = q, et = "amenity")

# Extract healthcare POI of interest
q2 = "SELECT * FROM 'points' WHERE healthcare IN ('pharmacy')"
tmp3 <- get_POI(region_name = "Languedoc-Roussillon", q = q2, et = "healthcare")
tmp4 <- get_POI(region_name = "Midi-Pyrenees", q = q2, et = "healthcare")

# Extract shop POI of interest
q3 = "SELECT * FROM 'points' WHERE shop IN ('convenience', 
'supermarket', 'clothes', 'hairdresser', 'car_repair', 'bakery', 'beauty', 
'kiosk', 'butcher', 'florist', 'shoes', 'optician', 'jewelry', 'greengrocer', 
'books', 'department_store', 'laundry', 'cosmetics', 'dry_cleaning', 'newsagent', '
funeral_directors', 'garden_centre', 'farm', 'seafood', 'general', 'boutique', 
'grocery', 'frozen_food', 'food', 'hairdresser_supply')"
tmp5 <- get_POI(region_name = "Languedoc-Roussillon", q = q3, et = "shop")
tmp6 <- get_POI(region_name = "Midi-Pyrenees", q = q3, et = "shop")

# Extract shop POI of interest
q4 = "SELECT * FROM 'points' WHERE office IN ('newspapper')"
tmp7 <- get_POI(region_name = "Languedoc-Roussillon", q = q4, et = "office") # no results
tmp8 <- get_POI(region_name = "Midi-Pyrenees", q = q4, et = "office") # no results

# Combine results
am <- rbind(tmp1, tmp2)
ph <- rbind(tmp3, tmp4)
sh <- rbind(tmp5, tmp6)
am$key <- "amenity"
ph$key <- "healthcare"
sh$key <- "shop"
colnames(am)[6] <- "value"
colnames(ph)[6] <- "value"
colnames(sh)[6] <- "value"
cols <- c("osm_id", "name", "key", "value")
df <- rbind(am[,cols], ph[,cols], sh[,cols]) 
df <- merge(df, full[,c("value", "BPE")], by = "value", all.x = TRUE)

# Export for further uses
st_write(df, "data/osm_rural_amenities_occitanie.csv", 
         layer_options = "GEOMETRY=AS_XY", row.names = FALSE)

Completeness by type facilities (regional scale)

The table below shows the number of POI for Occitanie by consolidated categories. Using as a reference BPE POI, the less complete amenity is car repairs (10.89 of BPE data %). The most contributed is post office (192.9 %). This is certainly due to a confusion for contributors with the OSM tag amenity=post_box. Banks, pharmacies and cinemas are quite contributed. These results comply with previous analysis made on the field: in North-Eastern part of France in 2020, the completeness was 45 % for banks, 50.5% for pharmacies and 90.5 % for cinemas, using the BPE version 2018 as a reference (Guérois et al. (2020)).

Code
# Merge BPE category to OSM layer
osm <- merge(osm, bpe_agg[,c("BPE", "Name")], by = "BPE", all.x = TRUE)

# Count number of POI by consolidated categories (OSM)
osm_agg <- aggregate(osm[,"Name"], by = list(Name = osm$Name), FUN = length)

# Same for BPE
bpe <- merge(bpe, bpe_agg[,c("BPE", "Name")], by.x = "TYPEQU", by.y = "BPE",
             all.x = TRUE)
bpe_agg <- aggregate(bpe[,"Name"], by = list(Name = bpe$Name), FUN = length)

# Combine OSM and BPE counts
osm_agg <- merge(osm_agg, bpe_agg, by = "Name", all.x = TRUE)
colnames(osm_agg)[2:3] <- c("osm", "bpe")
osm_agg$osm_bpe_rt <- round(osm_agg$osm / osm_agg$bpe * 100, 2)

# View results
datatable(osm_agg, rownames = FALSE, escape = FALSE,  class = "hoover", 
          options = list(pageLength = 50))

Geographical completeness

The geographical completeness of OSM data is evaluated for the 21 selected services and facilities.

Code
# Manage projections and transform in WGS84
osm <- st_as_sf(osm, coords = c("X", "Y"), crs = 4326) %>%
  st_transform(2154)
bpe <- st_as_sf(bpe, coords = c("LAMBERT_X", "LAMBERT_Y"), crs = 2154)
grid_1k <- st_transform(grid_1k, 2154)
reg <- st_read("data/reg_occitanie.gpkg")
#> Reading layer `reg_occitanie' from data source 
#>   `C:\gitlab\GRANULAR\osm_poi\data\reg_occitanie.gpkg' using driver `GPKG'
#> Simple feature collection with 1 feature and 4 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 428145.7 ymin: 6137101 xmax: 848021.3 ymax: 6439668
#> Projected CRS: RGF93 Lambert 93

# Keep only OSM points included in the Occitanie boundaries
inter <- st_intersects(osm, reg, sparse = FALSE)
osm <- osm[inter,]

# Count number of OSM and BPE objects in 1km grid square
inter <- st_intersects(grid_1k, bpe, sparse = TRUE)
grid_1k$bpe_nb <- sapply(X = inter, FUN = length)
inter <- st_intersects(grid_1k, osm, sparse = TRUE)
grid_1k$osm_nb <- sapply(X = inter, FUN = length)

# OSM completeness (% of overall BPE POI)
grid_1k <- grid_1k[grid_1k$bpe_nb > 0,]
grid_1k$OSM_BPE <- grid_1k$osm_nb / grid_1k$bpe_nb * 100

POI distribution is unequal over the space: a lot of POI are concentrated in metropolitan areas, few POI are distributed in countryside. Thus, input POI are smoothed using Stewart’s potential. OSM and BPE POI are aggregated in a regular grid of 2.5 km and smoothed using an inverse exponential function in a 5km span. This allows to qualify the OSM completeness in a geographical neighborhood. Completeness is also assessed in the 1km reference grid.

Code
# Potential completeness
osm$nb <- 1
bpe$nb <- 1
y <- create_grid(x = grid_1k, res = 2500)
y$osm <- mcpotential(x = osm, y = y,  var = "nb", fun = "e", span = 5000,
                   beta = 2, limit = 10000)
y$bpe <- mcpotential(x = bpe, y = y, var = "nb", fun = "e", span = 5000,
                   beta = 2, limit = 10000)
y$comp <- y$osm / y$bpe * 100

The overall completeness in Occitanie is equal to 42.41 %. Important differences can be observed over the region. For some rural areas, the completeness of OSM data is really poor, below 25 %. This analysis comply with the one made in other project on the North-Eastern part of France (Guérois et al. (2020)), where the data completeness was equal to 31 % for a selection of 15 other services.

Figure 7 displays interactively the results. It shows important differences at the scale of the region: main metropolitan areas (Toulouse, Montpellier) are characterized by completeness values between 50 and 66 %. The completeness index is comprised between 33 and 50 % for areas surrounding Nîmes and Perpignan. Low values completeness (below 20 %) for some specific areas: Bézier and the Toulouse neighboring area.At 1km grid level, most of the 1km grid cells are described by completeness indexes below 20 %. It is mainly due to the fact that most of these square grids includes a single BPE POI, and no OSM in the meantime. The completeness index shows its highest values in Pyrenees and in the neighborhood of natural reserves (Cévennes, in the North-Eastern part of the region). We can make the assumption that the touristic attraction operated by such natural areas scarcely populated can boost OSM contributions (few POI of high interest to contribute).

Code
# See max values inside Occitanie region for breaks
inter <- st_intersects(y, reg, sparse = FALSE)
y_reg <- y[inter,]
bks <- c(0, 20, 33, 50, 66, 75, 100, 125, 2145)

# Equipotential map
iso <- equipotential(x = y, var = "comp", breaks = bks, mask = reg)

# Layers in Long/lat (WGS84)
osm <- st_transform(osm, 4326)
bpe <- st_transform(bpe, 4326)
grid_1k <- st_transform(grid_1k, 4326)
iso <- st_transform(iso, 4326)

# Map parameters
cols <- c("#bd0026", "#f03b20", "#fd8d3c", "#feb24c",  "#fed976", "#ffffb2","#a1d99b", "#31a354")
pal <- colorBin(cols, domain = grid_1k$OSM_BPE, bins = bks)
pal2 <- colorBin(cols, domain = iso$min, bins = bks)
labels <- sprintf(
  "<strong>Pop (2021): </strong> %s<br/><strong>Completeness</strong>: %g (per 100)",
  grid_1k$TOT_P_2021, round(grid_1k$OSM_BPE,1)) %>% lapply(htmltools::HTML)

# Map
leaflet()%>% 
  addProviderTiles("OpenStreetMap.HOT")%>%
  addPolygons(data = iso, fillColor =  ~pal2(min), fillOpacity = 0.7, 
              weight = 1.2, group = "Completeness (pot)", stroke = FALSE)%>%  
  addPolygons(data = grid_1k, fillColor =  ~pal(OSM_BPE), fillOpacity = 0.7, 
              weight = 1.2, group = "Completeness (grid)", stroke = FALSE, label = labels,
              labelOptions = labelOptions(style = list("font-weight" = "normal", padding = "3px 8px"),
                                          textsize = "15px", direction = "auto"))%>% 
  addLegend(data=grid_1k, pal = pal, values = ~OSM_BPE, opacity = 0.7, 
            title = "Nb OSM/<br>BPE POI * 100 ", position = "bottomright")%>%  
  addCircleMarkers(data = bpe, group = "BPE (points)", color = "blue", radius = 4, fillOpacity = 1,
                   stroke = FALSE, popup = ~Name, clusterOptions = markerClusterOptions())%>%
  addCircleMarkers(data = osm, group = "OSM (points)", color = "red", radius = 4, fillOpacity = 1,
                   stroke = FALSE, popup = ~Name, clusterOptions = markerClusterOptions())%>%  
  addLayersControl(overlayGroups = c("Completeness (pot)", "Completeness (grid)", "BPE (points)", "OSM (points)"),
    options = layersControlOptions(collapsed = FALSE)) %>% 
  hideGroup(c("Completeness (grid)", "BPE (points)", "OSM (points)"))
#> Warning in pal(OSM_BPE): Some values were outside the color scale and will be
#> treated as NA